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Compressed Sensing (CS) is a new sampling/data acquisition 
theory based on the existence of a sparse representation of a 
signal and a projected dictionary PD, where P £ R mxd } s the 
projection matrix and D £ R dxn is the dictionary. To exactly 
recover the signal with a small number m of measurements, 
it is expected that the projected dictionary PD is of low 
mutual coherence. Several previous methods attempt to find the 
projection P such that the mutual coherence of PD is as low as 
possible. However, they do not minimize the mutual coherence 
directly and thus these methods may be far from optimal. Also, 
the solvers they use lack convergence guarantee and thus the 
quality of their obtained solutions is not guaranteed. This work 
aims to address these issues. We propose to find an optimal 
projection matrix by minimizing the mutual coherence of PD 
directly. This leads to a nonconvex nonsmooth minimization 
problem. We then approximate it by smoothing and solve it 
by alternating minimization. We further prove the convergence 
of our algorithm. To the best of our knowledge, this is the 
first work which directiy minimizes the mutual coherence of the 
projected dictionary and has convergence guarantee. Numerical 
experiments demonstrate that the proposed method can recover 
sparse signals better than existing ones. 

I. Introduction 

C Ompressed Sensing (CS) m, m is a new sampling/data 
acquisition theory asserting that one can exploit sparsity 
or compressibility when acquiring signals of interest. It shows 
that signals which have a sparse representation with respect 
to appropriate bases can be recovered from a small number 
of measurements. A fundamental problem in CS is how to 
construct a measurement matrix such that the number of 
measurements is near minimal. 

Consider a signal x £ R d which is assumed to have a sparse 
representation with respect to a fixed overcomplete dictionary 
D £ M. dxn ( d < n). This can be described as 

x = Da, (1) 

where a £ M” is a sparse representation coefficient, i.e., 
IHIo <C n. Here ||a|| 0 denotes the f^-norm which counts the 
number of nonzero elements in a. The solution to problem 
0 is not unique since d < n. To find an appropriate solution 
in the solution set of 0, we need to use some additional 
structures of D and a. Considering that a is sparse, we are 
interested in finding the sparsest representation coefficient a. 
This leads to the following sparse representation problem 

min||a|| 0 , s.t. x = Da. (2) 
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However, the above problem is NP-hard CD and thus is 
challenging to solve. Some algorithms, such as Basis Pursuit 
(BP) 0I and Orthogonal Matching Pursuit (OMP) Q, can be 
used to find suboptimal solutions. 

An interesting theoretical problem is that under what con¬ 
ditions the optimal solution to ([2]) can be computed. If the 
solution is computable, can it be exactly or approximately 
computed by BP or OMP? Some previous works answer 
the above questions based on the mutual coherence of the 
dictionary D (6l . 

Definition 1: Given D = [di,--- ,d„] £ R dxrl , its mutual 
coherence is defined as the largest absolute and normalized 
inner product between different columns of D, i.e., 


M(D) = 


|dfdj| 

i<v<« iicLHH-ir 


The mutual coherence measures the highest correlation be¬ 
tween any two columns of D. It is expected to be as low as 
possible in order to find the sparest solution to (|2]). 

Theorem 1: CD- GL El For problem Q, if a satisfies 

wio<K i+ ^)- ,3> 

then the following results hold: 

• a is the solution to 0- 

• a is also the solution to the following convex t\- 
minimization problem 


min || a || , s. t. x = Da, 

a 1 

where Ha)^ = JT |aj| is the fi-norm of a. 

• a can be obtained by OMP. 

The above theorem shows that if the mutual coherence of D 
is low enough, then the sparest solution to ([2]) is computable. 
Thus, how to construct a dictionary D with low mutual 
coherence is crucial in sparse coding. In CS, to reduce the 
number of measurements, we face a similar problem on the 
sensing matrix construction. 

The theory of CS guarantees that a signal having a sparse 
representation can be recovered exactly from a small set of 
linear and nonadaptive measurements. This result suggests 
that it may be possible to sense sparse signals by taking 
far fewer measurements than what the conventional Nyquist- 
Shannon sampling theorem requires. But note that CS differs 
from classical sampling in several aspects. First, the sampling 
theory typically considers infinite-length and continuous-time 
signals. In contrast, CS is a mathematical theory that focuses 
on measuring finite-dimensional vectors in R". Second, rather 
than sampling the signal at specific points in time, CS systems 
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typically acquire measurements in the form of inner products 
between the signal and general test functions. At last, the 
ways to dealing with the signal recovery are different. Given 
the signal x G I d in (JTJ, CS suggests replacing these n 
direct samples with m indirect ones by measuring linear 
projections of x defined by a proper projection or sensing 
matrix P € R mxd , i.e., 

y = Px, (4) 

such that to <C d. It means that instead of sensing all n 
elements of the original signal x, we can sense x indirectly by 
its compressed form y in a much smaller size in. Surprisingly, 
the original signal x can be recovered from the observed y 
by using the sparse representation in (jTJ, i.e, y = PDa with 
the sparest a. Thus the reconstruction requires solving the 
following problem 


1) The Algorithm of Elad 

The algorithm of Elad 0 considers minimizing the t- 
averaged mutual coherence defined as the average of the 
absolute and normalized inner products between different 
columns of M which are above t, i.e., 

Yhl<i,j<k, ijtj Xt(\9ij\)\9ij\ 

fff( M ) = ^ - . 77 h , 

where \t{x) is the characteristic function defined as 

1 , if x > t, 

0 , otherwise, 

and t is a fixed threshold which controls the top fraction of 
the matrix elements of |G| that are to be considered. 

To find P by minimizing p, t (M), some properties of the 
Gram matrix G = M r M are used. Assume that each column 
of M is normalized to have unit Euclidean length. Then 


minllalL, s. t. y = Ma, (5) 

OL U 

where M = PD G R mxn is called the effective dictionary. 
Problem ([5]> is also NP-hard. As suggested by Theorem [T] if 
the mutual coherence of PD is low enough, then the solution 
a to (|5]i is computable by OMP or by solving the following 
convex problem 

min ||ct||, , s.t. y = Ma. ( 6 ) 

Finally, the original signal x can be reconstructed by x = Da. 
So it is expected to find a proper projection matrix P such that 
p(PD) is low. Furthermore, many previous works 0, ffTOl 
show that the required number of measurements for recovering 
the signal x by CS can be reduced if /i(PD) is low. 

In summary, the above discussions imply that by choos¬ 
ing an appropriate projection matrix P such that p(PD) is 
low enough, the true signal x can be recovered with high 
probability by efficient algorithms. At the beginning, random 
projection matrices were shown to be good choices since their 
columns are incoherent with any fixed basis D with high 
probability DU- However, many previous works 0 , im, 
ma show that well designed deterministic projection matrices 
can often lead to better performance of signal reconstruction 
than random projections do. In this work, we focus on the 
construction of deterministic projection matrices. We first give 
a brief review on some previous deterministic methods. 


A. Related Work 

In this work, we only consider the case that D is fixed while 
P can be changed. Our target is to find P by minimizing 
p(M), where M = PD. If each column of M is normalized 
to have unit Euclidean length, then /z(M) = 11G11^ off , where 
G = ( 9ij ) = M t M is named as the Gram matrix and 
|| G||^ off = max i^j Iffy I i s the largest off-diagonal element 
of | G |. Several previous works used the Gram matrix to find 
the projection matrix p 0, ma, oa. We give a review on 
these methods in the following. 


diag (G) = 1, (7) 

rank(G)=m. ( 8 ) 

The work 0 proposed to minimize /x t (M) by iteratively 
updating P as follows. First, initialize P as a random matrix 
and normalize each column of PD to have unit Euclidean 
length. Second, shrink the elements of G = M T M (where 
M = PD) by 

! T9ij, if Iffy | > t, 

7 fsign(c/ ii ), if f > |Sij| > 7*. 

9ij, if yt > Iffjjl, 

where 0 < 7 < 1 is a down-scaling factor. Third, apply SVD 
and reduce the rank of G to be equal to to. At last, build the 
square root S of G: S T S = G, where S G R mx ", and find 
P = SDf, where ' denotes the Moore-Penrose pseudoinverse. 

There are several limitations of the algorithm of Elad. 
First, it is suboptimal since the /-averaged mutual coherence 
is different from the mutual coherence /r(M) which is 
our real target. Second, the proposed algorithm to minimize 
fj t (M) has no convergence guarantee^] So the quality of the 
obtained solution is not guaranteed. Third, the choices of 
two parameters, t and 7 , are crucial for the signal recovery 
performance in CS. However, there is no guideline for their 
settings and thus in practice it is usually difficult to find their 
best choices. 

2) The Algorithm of Duarte-Carajalino and Sapiro 
The algorithm of Duarte-Carajalino and Sapiro lfl2ll is not a 
method that is based on mutual coherence. It instead aims to 
find the sensing matrix P such that the corresponding Gram 
matrix is as close to the identity matrix as possible, i.e., 

G = M t M = D t P t PD « I, (9) 

where I denotes the identity matrix. Multiplying both sides of 
the previous expression by D on the left and D^ on the right, 
it becomes 

DD t P t PDD t « DD t . (10) 

! In this paper, an algorithm converges means that any accumulation point 
is a stationary point. 
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Let DD t = VAV T be the eigen-decomposition of DD T . 


Then (10 1 is equivalent to 

AV t P t PVA = A. 


( 11 ) 


Define P = PV. Then they finally formulate the following 
model w.r.t. T 

min II A- Ar T rA|L. ( 12 ) 

r n a* 

After solving the above problem, the projection matrix can be 
obtained as P = PV T . 

However, usually the signal recovery performance of the 
algorithm of Duarte-Carajalino and Sapiro is not very good. 
The reason is that M is overcomplete and the Gram matrix G 
cannot be an identity matrix. In this case, simply minimizing 
the difference between the Gram matrix G and the identity 
matrix does not imply a solution M with low mutual coher¬ 
ence. 

3) The Algorithm of Xu et al. 

The algorithm of Xu et al. iflOl is motivated by the well- 
known Welch bound fL3l . For any M £ M mxn , the mutual 
coherence /r(M) is lower bounded, e.g.. 


MM) > J U m (13) 

y m(n — 1) 

The algorithm of Xu et al. aims to find M such that the off- 
diagonal elements of G = M 7 M approximate the Welch 
bound well. They proposed to solve the following problem 


imn||G-G A || F 

s.t. G a = GL diag(G A ) = 1, UGaH^ ^ < p,w, 


(14) 


where p w = yj~- 


m(w-i) ■ The proposed iterative solver for 
the above problem is similar to the algorithm of Elad. The 
main difference is the shrinkage function used to control the 
elements of G. See IH for more details. 

However, their proposed solver in flOl for ( |T4| also lacks 
convergence guarantee. Another issue is that, for M £ R mx ", 
the Welch bound (13 i is not tight when n is large. Actually, 
the equality of (131 can hold only when n < . This 

implies that the algorithm of Xu et al. is not optimal when 
n > 


m(m+1) 


B. Contributions 

There are at least two main issues in the previous methods 
reviewed above. First, none of them aims to find P by 
directly minimizing /i(PD) which is our real target. Thus the 
objectives of these methods are not optimal. For their obtained 
solutions P, /i(PD) is usually much larger than the Welch 
bound in Second, the algorithms of Elad and Xu et al. 
have no convergence guarantee and thus they may produce 
very different solutions given slightly different initializations. 
The convergence issue may limit their applications in CS. 

To address the above issues, we develop Direct Mutual Co¬ 
herence Minimization (DMCM) models. First, we show how 
to construct a low mutual coherence matrix M by minimizing 
//(M) directly. This leads to a nonconvex and nonsmooth 
problem. To solve our new problem efficiently, we first smooth 


the objective function such that its gradient is Lipschitz con¬ 
tinuous. Then we solve the approximate problem by proximal 
gradient which has convergence guarantee. Second, inspired 
by DMCM, we propose a DMCM based Projection (DMCM- 
P) model which aims to find a projection P by minimizing 
/i(PD) directly. To solve the nonconvex DMCM-P problem, 
we then propose an alternating minimization method and prove 
its convergence. Experimental results show that our DMCM-P 
achieves the lowest mutual coherence of PD and also leads 
to the best signal recovery performance. 


II. Low Mutual Coherence Matrix Construction 


In this section, we show how to construct a matrix M £ 
R“X" with low mutual coherence ^(M) by DMCM. Assume 
that each column of M is normalized to unit Euclidean length. 
Then we aim to find M by the following DMCM model 


min u(M) = ||M t M|| , 

msR™*™ 11 lloo ’ of 

s.t. ||M.j || 2 = 1, i =!,-■■ , n, 


(15) 


where M j (or (Al),) denotes the i-th column of M. The above 
problem is equivalent to 


min /(M) = ||M t M - l|| 
M€M mXn 11 11 

s.t. ||]vy 2 = 1, i = !,-■■ ,n, 


(16) 


where HAjloo = maxjj |a^j denotes the ^-norm of A. 
Solving the above problem is not easy since it is nonconvex 
and its objective is nonsmooth. In general, due to the noncon¬ 
vexity, the globally optimal solution to ( fl6] i is not computable. 
We instead consider finding a locally optimal solution with 
convergence guarantee. 

First, to ease the problem, we adopt the smoothing technique 
in DQ3 to smooth the nonsmooth f^-norm in the objective of 
By the fact that the t\ -norm is the dual norm of the 
foo-norm, the objective function in ( fl 6 | ) can be rewritten as 

/(M) = ||M t M — l|| = max (M T M-I,V), 

v ' II Woo HVIU^t x h 


where ||V||i = ff, tl \vij\ denotes the £i-norm of V. Since 
{vmviu < 1 } is a bounded convex set, we can define a 
proximal function d(V) for this set, where d(V) is continuous 
and strongly convex on this set. A natural choice of dCV) is 
d(V) = 5 ||V||p, where || • ||i? denotes the Frobenius norm of 
a matrix. Hence, we have the following smooth approximation 
of / defined in ( fl6] l: 

/p(M) = max (M t M — I, V) - P - || V|£ , (17) 

|V||i<l z 


where p > 0 is a smoothing parameter. Note that the smooth 
function f p can approximate the nonsmooth / with an arbitrary 
precision and it is easier to be minimized. Indeed, / and f p 
have the following relationship 


/p(M) < /(M) < / P (M) + p7, 

where 7 = max v {§ ||V||^, | HV^ < 1}. For any e > 0, if 
we choose p = j), then |/(M) — / p (M)| < e. This implies 
that if p is sufficiently small, then the difference between / 
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Algorithm 1 Solve ( |~i~8j ) by Proximal Gradient algorithm. 
Initialize: k = 0, Mfc £ R mxn , p > 0, a = 0.99 p, K > 0. 
Output: M* = PG(M fc ,p). 
while k < K do 

1) Compute Vfc by solving ( f2l) ; 

2) Compute by solving |l9|); 

3) k = k + 1. 
end while 


mm 


M 


2a 




If 


= argmin - ||M - (M fc - aV/ p (M fc ) 
s.t. ||Mj|| 2 = 1, i = !,■■■ ,n, 


If 




\\(M k -aS7f p (M k ))i\ 


V fc = arg min - ||V - (M^M fc - I)/p| 


s.t. IIVI 


v 2 
i<l- 


Algorithm 2 Solve (181 by PG with continuation trick. 


and f p can be very small. This motives us to use f p to replace 
/ in ( [T6| and thus we have the following relaxed problem 

f P ( M) 

08) 


Initialize: p > 0, a = 0.99p, rj > 1, M, t = 0, T > 0. 

while t < T do 

1) M = PG(M,p) by calling Algorithm [lj 

2) p = p/p, a = 0.99 p; 

3) t = t T 1. 

end while 


Though PG is guaranteed to converge, the obtained subopti- 
mal solution to ( |~i~8j ) may be far from optimal to problem ( p~6| ) 
which is our original target. There are two important factors 
which may affect the quality of the obtained solution by PG. 


First, due to the nonconvexity of (18i, the solution may be 


s.t. ||Mi|| 2 = 1, i = 1, • • • ,n. 

As f p can approximate / at an arbitrary precision, solving 
( f]~8f > can still be regarded as directly minimizing the mutual 
coherence. Problem (181 is easier to solve since V/„ (M) = 
M(V* + V* T ), where V* is the optimal solution to ( |l7| >, is 
Lipschitz continuous. That is, for any M l7 M 2 £ ffi 
exists a constant L = 1/p such that 


sensitive to the initialization of M. Second, the smoothing 
parameter p > 0 should be small so that the objective f p in 


(181 can well approximate the objective / in (16 1 . However, 
if p is directly set to a very small value, PG may decrease 


\ there 


I! V/ p (Mi) - V/ p (M 2 )|| f < L ||Mi - M 2 || f . 

With the above property, problem ( p~8] > can be solved by the 
proximal gradient method which updates M in the (k + l)-th 
iteration by 

M fc+1 =argmin(V/ p (M fe ),M —M fe ) 


(19) 


the objective function value of (18 i very slowly. This can be 
easily seen from the updating of M in ( fl9| ), where a < p. 
To address the above two issues, we use a continuation trick 
to find a better solution to ( [T6| > by solving ( [T8| with different 
initializations. Namely, we begin with a relatively large value 
of p and reduce it gradually. For each fixed p, we solve ( [T8| by 
PG in Algorithm [I] and use its solution as a new initialization 
of M in PG. To achieve a better solution, we repeat the 
above procedure T times or until p reaches a predefined small 
value p m ; n . We summarize the procedure of PG with the 
continuation trick in Algorithm [2] 

Finally, we would like to emphasize some advantages of 


where a > 0 is the step size. To guarantee convergence, it is 
required that a < p. In this work, we simply set a = 0.99 p. 
The above problem has a closed form solution by normalizing 
each column of Mfc — aV/ p (Mfc), i.e., 

(M fc - aV/ p (M fc ))i 


our DMCM model (161 and the proposed solver. A main merit 
of our model © is that it minimizes the mutual coherence 
/r(M) directly and thus the mutual coherence of its optimal 
solution can be low. Though the optimal solution is in general 


not computable due to the nonconvexity of (16 1 , our proposed 


( 20 ) 


To compute V/ p (Mfc) = M k(Vk + Vfc T ), where V& is 


optimal to ( fT7] > when M = Mfc, one has to solve (171 which 
is equivalent to the following problem 


( 21 ) 


Solving the above problem requires computing a proximal 
projection onto the ball. This can be done efficiently by 
the method in na. 

Iteratively updating V by pT| ) and M by ( [T9| leads to 
the Proximal Gradient (PG) algorithm for solving problem 
( f]~8f l. We summarize the whole procedure of PG for ( fl8] > in 
Algorithm |T| It can be easily seen that the per-iteration cost 
of Algorithm [T] is 0{m 2 n + mn 2 ). For nonconvex problems, 
e.g., ©, it was proved that PG is guaranteed to converge. 

Theorem 2: Let {Mfc} be the sequence generated 

by PG in Algorithm 0- Then {Mfc} is bounded and any 
accumulation point M* of {Mfc} is a stationary point. 


solver, which first smooths the objective and then minimizes 
it by PG, has convergence guarantee. To the best of our 
knowledge, this is the first work which directly minimizes the 
mutual coherence of a matrix with convergence guarantee. 

III. Low Mutual Coherence Based Projection 

In this section, we show how to find a projection matrix P 
such that /i(PD) can be as low as possible. This is crucial for 
signal recovery by CS associated to problem (|5j. Similar to 
the DMCM model shown in ( p~6l >, an ideal way is to minimize 
/j(PD) directly, i.e., 


mm 

PgRmX d 


|(pd) t (pd) -1| 


s.t. ||PDj|L = 1, i = !,••• ,n. 


( 22 ) 


However, the constraint of d22li is more complex than the one 


in ( pT6) , and thus (|22| is much more challenging to solve. We 

based on the 


then 


instead consider an approximate model of 
following observation. 

Theorem 3: For any Mi, M 2 £ R mxrl , if Mi —> M 2 , 
/i(Mi) -»■ MM 2 ). 

It is easy to prove the above result by the definition of the 
mutual coherence of a matrix. The above theorem indicates 
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Algorithm 3 Solve (24) by Alternating Minimization. 


p mxd 


Mfc e 


Initialize: k = 0, Pfc 

0.99 p, (3 > 0. 

Output: {P*M*} = AM(Mfc, Pfc, p, /3). 

while k < K do 

1) Compute Vfc by solving ( f2l) ; 

2) Compute M/, :+ i by solving (|25|); 

3) Compute P/, :+ i by solving ( |26[ >; 

4) k = k + 1 . 

end while 


Algorithm 4 Solve (24) by AM with continuation trick. 


p > 0, a = 


Initialize: p > 0, a = 0.99p, /3 > 0, 77 > 1, M, P, t = 0, 
T > 0. 

while t < T do 

1) (P,M) = AM(P, M, p, f3) by calling Algorithm [ 3 ] 

2) p = p/r), a = 0.99p; 

3 ) P = P/rp, 

4) t = t + 1. 

end while 


which has a closed form solution P = Mfc + iD^. 


following approximate model of (22 1 : 


PeR’ 


mm 

““'.Met” 


M — II 


:.Aj 

2/3 1 


M — PD|||. 


s.t. 1111 2 = = 


where /3 > 0 trades off //(M) and the difference between M 
and PD. To distinguish from the DMCM model in ( p~6] >. in this 
paper we name the above model as DMCM based Projection 
(DMCM-P). 

Now we show how to solve First, we smooth 

||M t M — Ijloo as / P (M) defined in (17 1 . Then problem (23 1 


min F(M,P) = / e (M) + 

P,M r 

s.t. ||M,|| 2 = l,i= !,-■ 


2/3 ^ 

• ,n. 


M-PDIII 


P and M to solve problem (24 1 . 

1. Fix P = P/,. and update M by 

M 


k +1 

= argmin(V/p(Mfc), M - M fc ) + ||M - M fc |£ 

m la 


+ 


— | 
2/3 1 


|M — Pi-Dll 


= are mm - 
6 m 2 


M ■ 


(iM t + iP t D-V/p(M t ) 


i + i 

a ' 0 


s.t. ||Mj|L = 1, i = , n, 


where a > 0 is a step size satisfying a < p. Similar to (19 1 , 
the above problem has a closed form solution. To compute 


that the difference of the mutual coherences of two matrices 
is small when the difference of two matrices is small. This 
motivates us to find M such that p,(M) is low and the 
difference between M and PD is small. So we have the 


Iteratively updating P by (26 1 and M by (25 1 leads to 


the Alternating Minimization (AM) method for ( [24} . We 
summarize the whole procedure of AM in Algorithm [3] It 
can be easily seen that the per-iteration cost of Algorithm [3] 
is 0((d + m)n 2 + n 3 ). We can prove that any accumulation 


(23) 


can be approximated by the following problem with a smooth 
objective: 


(24) 


When both p and /3 are small, / p is very close to /. So is 
/i(PD) to p(M) because ||M — PD||i? has to be small. Thus 
solving problem ( f24| > can still be regarded as minimizing the 
mutual coherence directly. We propose to alternately update 


(25) 


V/p(Mfc) in (25 1 , we also need to compute Vp by solving 

2. Fix M = M/. +1 and update P by solving 

Pfc+i = argmin ||M fc+ i - PD|||., (26) 


point of AM is a stationary point of (24 1 . 

Theorem 4: Assume that D in problem ( [24} is of full row 
rank. Let {(M^, P/,.)} be the sequence generated by Algorithm 

[3] Then the following results hold: 

(i) F(Mp,Pfc) is monotonically decreasing. 

(ii) Mfc + i — Mfc —> 0, Pfc + i Pfc —> 0. 

(iii) The sequence {(Mfc,Pfc)} is bounded. 

(iv) Any accumulation point of {(Mfc, Pfc)} is a stationary 
point. 

The proof of Theorem [4] can be found in Appendix. Note 
that to guarantee the convergence of Algorithm [3] Theorem 

[4] requires D in problem (24 1 to be of full row rank. Such 
an assumption usually holds in CS since D £ R dxn is an 
overcomplete dictionary with d < n. 

Though AM is guaranteed to converge, the obtained solution 
to ( |24[ > may be far from optimal to problem ( |23[ which is our 
original target. In order for ( [24} to approximate (23 1 well, 
p > 0 should be small. On the other hand, /3 > 0 should also 
to be small such that the difference between M and PD is 
small and thus /i(PD) can well approximate //(M). Similar 
to Algorithm [2] we use a continuation trick to achieve a good 
solution to ( [23} . Namely, we begin with a relatively large value 
of p > 0 and /3 > 0 and reduce them gradually. For each fixed 
pair (p,/3), we solve (24 1 by AM in Algorithm [3] and use its 
solution as a new initialization of P and M in AM. We repeat 
the procedure T times or until p and /3 reach predefined small 
values p m j n and /3 n ,i n . We summarize the procedure of AM 
with the continuation trick in Algorithm [4] 

Finally, we would like to emphasize some advantages of 
our DMCM-P over previous methods. The main merit of 
our DMCM-P is that it is the first model which minimizes 
p(PD) directly and the proposed solver also has convergence 
guarantee. The algorithms of Elad a and Xu et al. m are 
also mutual coherence based methods. But their objectives are 
suboptimal and their solvers lack convergence guarantee. 

IV. Numerical Results 

In this section, we conduct several experiments to verify 
the effectiveness of our proposed methods by comparing them 
with previous methods. The experiments consist of two parts. 
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(a) n = 60 


(b) n = 120 


(c) n = 180 


Fig. 1: Plots of the means and standard deviations of mutual coherences of M v.s. the number to of measurements. 
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(a) n = 60 


(b) n = 120 


(c) n = 180 


Fig. 2: Plots of the means and standard deviations of mutual coherences of PD v.s. the number m of measurements, where 
D is a standard Gaussian random matrix. 


The first part shows the values of mutual coherence. The 
second part shows the signal recovery errors in CS. 

A. Comparing the Mutual Coherence 

This subsection presents two experiments to show the ef¬ 
fectiveness of DMCM and DMCM-P, respectively. In the first 
experiment, we show that our DMCM is able to construct a 
matrix M £ R m x n with lower mutual coherence than previous 
methods do. We compare DMCM with 

• Random: random matrix whose elements are drawn in¬ 
dependently from the standard normal distribution. 

• Elad: the algorithm of Elad (51 with D = I. 

• Xu: the algorithm of Xu et al. (TOl with D = I. 

• Duarte: the algorithm of Duarte-Carajalino and Sapiro 
fl2l with D = I. 

• Welch bound: the Welch bound lfl3l shown in ( p~3] >. 

Note that the compared algorithms of Elad 0, Xu et al. (H 
and Duarte-Carajalino and Sapiro ifTZl were designed to find 
a projection P such that M = PD has low mutual coherence. 
They can still be compared with our DMCM by setting D as 
the identity matrix I. 

To solve our DMCM model in < ] 18 [ > . we run Algorithm [2] for 
15 iterations and Algorithm [T] for 1000 iterations. In Algorithm 
[2] we set po = 0.5 and p = 1.2. M is initialized as a Gaussian 
random matrix. In the method of Elad, we follow to set 


t = 0.2 and 7 = 0.95. In the method of Xu, we try multiple 
choices of the convex combination parameter a and set it as 
0.5 which results in the lowest mutual coherence in most cases. 
The method of Duarte do not need special parameters. All the 
compared methods have the same random initializations of P 
(except Duarte, which has a closed form solution). 

The compared methods are tested on three settings with 
different sizes of M £ R mxra : (1) m = [6 : 2 : 16], n = 60; (2) 
m = [10 : 5 : 35], n = 120; and (3) m = [10 : 10 : 50], n = 
180. Note that the constructed matrices may not be the same 
for the compared methods with different initializations. So for 
each choice of size ( m,n ), we repeat the experiment for 100 
times and record the means and standard deviations of the 
mutual coherences of the constructed matrices M. The means 
and standard deviations of mutual coherences v.s. the number 
to of measurements are shown in Figure [T] It can be seen that 
the matrix constructed by our DMCM achieves much lower 
mutual coherences than previous methods do. The main reason 
is that our DMCM minimizes the mutual coherence of M 
directly, while the objectives of all the previous methods are 
indirect. It can also be seen that the standard deviations of our 
method is close to zero, while some other compared methods 
may not be stable in some cases. A possible reason is that the 
solver of our method has convergence guarantee, while other 
methods do not. 





























































7 



Fig. 3: Plots of the means and standard deviations of mutual 
coherences of PD v.s. the number in of measurements, where 
the elements of D are uniformly distributed in [0,1], 


TABLE I: Comparison of running time (in seconds) of 
DMCM-P, Elad, Xu and Duarte on problem ( |23| ) under differ¬ 
ent settings. 
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Fig. 4: Distributions of the absolute values of (PD) T (PD). 


For the second experiment in this subsection, we show that 
for given D £ W lxn our DMCM-P is able to compute a 
projection P £ M mxd suc jj t jj at ppj g jjmxn jj as j ow mu t ua j 
coherence. We choose D to be a Gaussian random matrix in 
this experiment. To solve our DMCM-P model in ( |23) >, we 
run Algorithm [4] for 15 iterations and Algorithm [3] for 1000 
iterations. In Algorithm]?} we set po = 0.5, (3 = 2 and // = 1.2. 
P is initialized as a Gaussian random matrix. 

We compare our DMCM-P with the algorithms of Elad a. 
Xu et al. Q0) and Duarte-Carajalino and Sapiro m on the 
mutual coherence of PD. We test on three settings: (1) to = 
[6:2: 16], n = 60, d = 30; (2) m = [10 : 5 : 35], n = 120, 
d = 60; and (3) m = [10 : 10 : 50], n = 180, d = 90. 
Figure [2] shows the mutual coherence of PD as a function 
of the number to of measurements. It can be seen that our 
DMCM-P achieves the best projection such that PD has the 
lowest mutual coherences in all the three settings. So are the 
standard deviations. Note that our algorithm does not use any 
special property of D. So it is expected to work for D in 
other distributions as well. We test our method in the case 
that the elements of D are uniformly distributed in [0,1] and 
report the results in Figure [3] It can be seen that our method 
still outperforms other methods in both mean and standard 
deviation. 

Furthermore, Figure [4] shows the distribution of the absolute 
values of inner products between distinct columns of PD with 
to = 20, n = 120, and d = 60. It can be seen that our DMCM- 


P has the shortest tail, showing that the number of elements 
in the Gram matrix that are closer to the ideal Welch bound is 
larger than the compared methods. Such a result is consistent 
with the lowest mutual coherences shown in Figure [2] 

Finally, we report the running time of the algorithms of 
Elad, Xu, Duarte and our DMCM-P in Table |T| The settings 
of the algorithms are the same as those in Figure [2] and the 
running time is reported based on different choices of to, d and 
n. It can be seen that Duarte is the fastest method since it has 
a closed form solution. Our DMCM-P is not very efficient 
since we use the continuation trick in Algorithm [4] which 
repeats Algorithm [3] many times. Note that speeding up the 
algorithm, although valuable, is not the main focus of this 
paper. Actually, for many applications the projection matrix P 
can be computed offline. So we leave the speeding-up issue 
as future work. 

B. Comparing the CS Performance 

In this subsection, we apply the optimized projection by 
our DMCM-P to CS. We first generate a T-sparse vector 
a: £ M", which constitutes a sparse representation of signal 
x = Dec, where x £ R d . The locations of nonzeros are 
chosen randomly and their values obey a uniform distribution 
in [—1,1]. We choose the dictionary D £ R dxn as a Gaussian 
random matrix. Then we apply different projection matrices P 
learned by our DMCM-P, random projection matrix, and the 
algorithms of Elad (9), Xu et al. HOI and Duarte-Carajalino 































































































Fig. 5: Signal reconstruction errors v.s. number of measure¬ 
ments. 



and Sapiro |[T2l to generate the compressed y via y = PDa. 
At last, we solve problem © by OMP to obtain a. We 
compare the performance of projection matrices computed 
by different methods using the relative reconstruction error 
||x — x* || 2 /Hx* || 2 , where x* is the ground truth. A smaller 
reconstruction error means better CS performance. 

We conduct two experiments in this subsection. The first 
one changes the number m of measurements and the second 
one changes the sparsity level T. For every value of the 
aforementioned parameters we perform 3000 experiments and 
calculate the average relative reconstruction error. 

In the first experiment, we set m = [6 : 2 : 16], n = 60, 
d = 30 and T = 2. Figure [5] shows the average relative 
reconstruction error v.s. the number m of measurements (T is 
fixed). The CS performance improves as m increases. Also, as 
expected, all the optimized projection matrices produce better 



Fig. 7: Signal reconstruction errors v.s. number of measure¬ 
ments in the noisy case. 


CS performance than the random projection does, and our 
proposed DMCM-P consistently outperforms the algorithms 
of Elad, Xu et al. and Duarte-Carajalino and Sapiro. 

In the second experiment, we set m = 18, n = 180 and 
d = 90 and vary the sparsity level T from 1 to 6. Figure [6] 
shows the average relative reconstruction error as a function 
of the sparsity level T (m is fixed). The CS performance also 
improves as T decreases. Also, our DMCM-P consistently out¬ 
performs random projection and other deterministic projection 
optimization methods. This is due to the low mutual coherence 
of PD thanks to our optimized projection method as verified 
in the previous experiments. 

We also test the noisy case. We add Gaussian random noise 
with 0 mean and 0.01 variance to each element of the obser¬ 
vation y and then recover the true signal from this noisy y. 
This time we test with D in another different distribution and 
another choice of the ratio n/d. We generate elements of D by 
a uniform distribution on [0,1]. We choose m = [ 6:2: 16], 
d = 40 and n - 60. Figure [7] shows the relative reconstruction 
error v.s. the number of measurements. It can be seen that our 
method also achieves the best performance in almost all cases. 

V. Conclusions 

This paper focuses on optimizing the projection matrix 
in CS for reconstructing signals which are sparse in some 
overcomplete dictionary. We develop the first model which 
aims to find a projection P by minimizing the mutual co¬ 
herence of PD directly. We solve the nonconvex problem 
by alternating minimization and prove the convergence. Sim¬ 
ulation results show that our method does achieve much 
lower mutual coherence of PD, and also leads to better CS 
performance. Considering that mutual coherence is important 
in many applications besides CS, we expect that the proposed 
construction will be useful in many other applications as well, 
besides CS. 

There is some interesting future work. First, though we 
give the first solver with convergence guarantee in Algorithm 
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ID for ©, the obtained solution is not guaranteed to be 
globally optimal due to the nonconvexity of the problem. 
It is interesting to investigate when the obtained solution is 
globally optimal. Second, it is valuable to find faster solvers. 
For example, we may consider solving © and ( f22| by 
Alternating Direction Method of Multiplier (ADMM) after 
introducing some auxiliary variables, which may be more 
efficient than our current solvers. But proving its convergence 
for nonconvex problems, (| 1 6[> and (22 1 , will be challenging. 


Appendix 

In this section, we give the proof of Theorem [4] 

Definition 2 : OH, m Let g be a proper and lower 
semicontinuous function. 

1) For a given x £ dom g, the Frechet subdifferential of g 
at x, written as dg(x), is the set of all vectors u £ M" 
which satisfies 

liminf i>(y>-gW-( u ,y-*) „ 

yAx.y—yx ||y - x11 

2) The limiting-subdifferential, or simply the subdifferential, 
of g at x £ R n , written as dg(x), is defined through the 
following closure process 

dg{x) := fuel": 3x k ->• x, g(x k ) -»■ g(x), 
u fc <E dg(x k ) ->• u, k -)• oo}. 

Proposition 1 : na, GU The following results hold: 

1) In the nonsmooth context, the Fermat’s rule remains 
unchanged: If x £ R" is a local minimizer of g, then 
0 £ dg(x). 

2) Let (xfc, Ufc) be a sequence such that x k —>• x, Ufc — > u, 
ff(xfc) ->• g{x) and u fe £ dg(x k ). Then u £ dg(x). 

3) If / is a continuously differentiable function, then d (/ + 
g)(x) = Vf(x)+dg(x). 


h(M) = 


(27) 


Proof of Theorem |4f First, we define 

0 , if ||Mj|| 2 = 1 ,i = I,-- ,n, 
oo, otherwise. 

Then ( |25) ) can be rewritten as 

Mfc + i 

= argmin(V/ p (Mfc),M- Mfc) + ^ ||M — M fc ||^ 
m 2a 

+ j /3 \\M~P k B\\ 2 F + h(M). 

By the optimality of Mfc +1 , we have 

MMfc +1 ) + (V/ p (Mfc), Mfc +1 - Mfc) 

+ -^||M fc+1 - M k \\% + ^||Mfc +1 - PfcD||^ 

</i(Mfc) + ^||Mfc-PfcD|||, (28) 


and 


0 £dh( Mfc+i) + V/p(Mfc) H—(Mfc_)_i — Mfc) 


/3 


(Mfc +1 — P fcD). 


From the Lipschitz continuity of V/ P (M), we have 

P(Mfc + i, Pfc) 


=/p(M fc+1 ) + — ||Mfc +1 - P fc D|| F 


1 

</p( Mfc) + <V/p(M fc ), M fc+1 - Mfc 
| 2 


(29) 


+ ^||M fc+1 - MfcHl + ^||M fe+1 - PfcD||fi. 
Add ( |28j ) and ( |29[ i, we have 

h( Mfc + i) + F(Mfc + i, Pfc) 

<h( Mfc) + /p(Mfc) - l|Mfc+1 _ 


— | 
2/3 1 


Mfc — Pi-DII p 


(30) 


=h{ Mfc) + F( Mfc, Pfc) - ( ^ - Y p ) l|Mfc+1 “ Mfe|1 ^ 


Second, from the optimality of Pfc+i to problem (26 1 , we 
have 


F(Mfc +1 ,Pfc +1 )<F(Mfc +1 ,Pfc), (31) 


and 


0 — V P P(Mfc +1 , Pfc +1 ) — (Mfc +1 — P fc+1 D)D T . (32) 

By the assumption that D is of full row rank, ( [32| implies 
that 

Pfc +1 = Mfc +1 D T (DD T )- 1 . (33) 


Combining ( [30| and ( [3~i~| leads to 

ft'(Mfc + i) + P(Mfc_|_i, Pfc+i) 

<h( Mfc) + F( Mfc, Pfc) ) ||M fc+1 - Mfc|||.. 

(34) 

So h{ Mfc) +F(Mfc,Pfc) and F(Mfc,Pfc) are monotonically 
decreasing. Summing all the above inequalities for k > 0, it 
follows that 


/t(Mo) + F(M 0 ,P 0 ) 

+°° / i | \ 

fe=o v r 7 

>0. (35) 

This implies that Mfc + i — Mfc —>■ 0. Hence Pfc+i — Pfc —> 0 
by using ( |33) . 

Third, note that F(M, P) is coercive, i.e., F(M, P) 
is bounded from below and F(M,P) —> +oo when 
I![M,P]||f —> +oo. It can be seen from ( |34| ) that F(Mfc, Pfc) 
is bounded. Thus {Mfc, Pfc} is bounded. Then there exists an 
accumulation point (M*, P*) and a subsequence {M*,., P kj } 
such that (Mfc^Pfc^) —$■ (M*,P*) as j —$■ +oo. Now we 
prove that (M*,P*) is a stationary point. 
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From ( |29| > we have 

V/ p (M fc+1 ) - V/ p (M fc ) - - M fc ) 

-^(Pfc+i - Pfc)D 

=- -~(M k+1 - Mfc) - V/ p (Mfc) - ^(Mfc +1 - PfcD) 

a p 

+V/ p (Mfc + i) + —(Mfc_(_i — Pfc + iD) 

£ <9/i(Mfc + i) + V/ p (Mfc + i) + — (Mfc + i — Pfc + jD) 

= 3/»(Mfc + i) + Vm^ (Mfc + i, Pfc + i). (36) 

Thus 

||V/ p (Mfc +1 ) - V/ p (Mfc) - ^(Mfc +1 - Mfc) 
-i(Pfc +1 -Pfc)D|| F 

< ||V/p(M fc+1 ) - V/ p (Mfc)|| F + i||Mfc +1 - Mfc|| F 

+ ^ll(Pfc+i “ Pfc)D|k 

< -||Mfc + i — Mfc|| F H—||Mfc + i — Mfc|| F 

p a 

+ i||Pfc +1 -Pfc|| F ||D|| F . 

-» 0. (37) 


Since F(M. P) is continuously differentiable, we have 
F(M fcj ,Pfc j ) F(M*,P*). As h{ Mfc) = 0 for all k and 
the set {M : ||Mj ||2 = 1 ,i = 1, • • ■ ,n} is closed, we have 
h(M*) = 0 and P(Mfc i JPfcJ + h( M fc .) -»■ F(M*,P*) + 
h( M*). Combing (32 1, (36 1, (37 1 and using Proposition [I] we 
have 


OeSft(M*)+VF(M’,P*). (38) 

Thus (M*,P*) is a stationary point. ■ 
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